Search results for "text corpus"

showing 10 items of 14 documents

A Study on Classification Methods Applied to Sentiment Analysis

2013

Sentiment analysis is a new area of research in data mining that concerns the detection of opinions and/or sentiments in texts. This work focuses on the application and the comparison of three classification techniques over a text corpus composed of reviews of commercial products in order to detect opinions about them. The chosen domain is about "perfumes", and user opinions composing the corpus are written in Italian language. The proposed approach is completely data-driven: a Term Frequency / Inverse Document Frequency (TFIDF) terms selection procedure has been applied in order to make computation more efficient, to improve the classification results and to manage some issues related to t…

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniText corpusNaive Bayes classifierComputer sciencebusiness.industrySentiment analysisTF-IDFSentiment Classificationcomputer.software_genreClass Association RulesDomain (software engineering)Naive Bayes classifierRandom indexingArtificial IntelligenceSelection (linguistics)One-class classificationArtificial intelligenceRandom Indexingbusinesstf–idfcomputerNatural language processing

researchProduct

Analysis and Comparison of Deep Learning Networks for Supporting Sentiment Mining in Text Corpora

2020

In this paper, we tackle the problem of the irony and sarcasm detection for the Italian language to contribute to the enrichment of the sentiment analysis field. We analyze and compare five deep-learning systems. Results show the high suitability of such systems to face the problem by achieving 93% of F1-Score in the best case. Furthermore, we briefly analyze the model architectures in order to choose the best compromise between performances and complexity.

Text corpusComputer sciencemedia_common.quotation_subjectCompromiseFace (sociological concept)02 engineering and technologycomputer.software_genreField (computer science)020204 information systems0202 electrical engineering electronic engineering information engineeringnatural language processingmedia_commonSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - InformaticaSarcasmbusiness.industryDeep learningSentiment analysisdeep learningirony detectionIrony020201 artificial intelligence & image processingArtificial intelligencebusinesscomputersarcasm detectionNatural language processingProceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services

researchProduct

Automatic Dictionary Creation by Sub-symbolic Encoding of Words

2006

This paper describes a technique for automatic creation of dictionaries using sub-symbolic representation of words in cross-language context. Semantic relationship among words of two languages is extracted from aligned bilingual text corpora. This feature is obtained applying the Latent Semantic Analysis technique to the matrices representing terms co-occurrences in aligned text fragments. The technique allows to find the “best translation” according to a properly defined geometric distance in an automatically created semantic space. Experiments show an interesting correctness of 95% obtained in the best case.

Text corpusCorrectnessProbabilistic latent semantic analysisComputer scienceLatent semantic analysisbusiness.industryContext (language use)Translation (geometry)computer.software_genreFeature (linguistics)Artificial intelligencebusinessRepresentation (mathematics)computerNatural language processing

researchProduct

Syntagmatic and Paradigmatic Associations in Information Retrieval

2003

It is shown that unconscious associative processes taking place in the memory of a searcher during the formulation of a search query in information retrieval — such as the production of free word associations and the generation of synonyms — can be simulated using statistical models that analyze the distribution of words in large text corpora. The free word associations as produced by subjects on presentation of stimulus words can be predicted by applying first-order statistics to the frequencies of word co-occurrences as observed in texts. The generation of synonyms can also be conducted on co-occurrence data but requires second-order statistics. Both approaches are compared and validated …

Text corpusEmpirical dataSyntagmatic analysisInformation retrievalWeb search querySemantic similarityComputer scienceStatistical modelIndependent component analysisAssociative property

researchProduct

Methodological Approach for Messages Classification on Twitter Within E-Government Area

2018

The constant growth in the numbers of Social Media users is a reality of the past few years. Companies, governments and researchers focus on extracting useful data from Social Media. One of the most important things we can extract from the messages transmitted from one user to another is the sentiment—positive, negative or neutral—regarding the subject of the conversation. There are many studies on how to classify these messages, but all of them need a huge amount of data already classified for training, data not available for Romanian language texts. We present a case study in which we use a Naive Bayes classifier trained on an English short text corpus on several thousand Romanian texts. …

Text corpusFocus (computing)Computer scienceRomanianmedia_common.quotation_subjectSubject (documents)language.human_languageWorld Wide WebNaive Bayes classifierConstant (computer programming)languageSocial mediaConversationmedia_common

researchProduct

Review of Non-English Corpora Annotated for Emotion Classification in Text

2020

In this paper we try to systematize the information about the available corpora for emotion classification in text for languages other than English with the goal to find what approaches could be used for low-resource languages with close to no existing works in the field. We analyze the corresponding volume, emotion classification schema, language of each corresponding corpus and methods employed for data preparation and annotation automation. We’ve systematized twenty-four papers representing the corpora and found that corpora were mostly for the most spoken world languages: Hindi, Chinese, Turkish, Arabic, Japanese etc. A typical corpus contained several thousand of manually-annotated ent…

Text corpusHindiArtificial neural networkTurkishComputer sciencebusiness.industryEmotion classificationcomputer.software_genrelanguage.human_languageAnnotationNaive Bayes classifierComputingMethodologies_PATTERNRECOGNITIONSchema (psychology)languageArtificial intelligencebusinesscomputerNatural language processing

researchProduct

A Methodology for Bilingual Lexicon Extraction from Comparable Corpora

2015

Dictionary extraction using parallel corpora is well established. However, for many language pairs parallel corpora are a scarce resource which is why in the current work we discuss methods for dictionary extraction from comparable corpora. Hereby the aim is to push the boundaries of current approaches, which typically utilize correlations between co-occurrence patterns across languages, in several ways: 1) Eliminating the need for initial lexicons by using a bootstrapping approach which only requires a few seed translations. 2) Implementing a new approach which first establishes alignments between comparable documents across languages, and then computes cross-lingual alignments between wor…

Text corpusInterlinguaComputer sciencebusiness.industrymedia_common.quotation_subjectBootstrapping (linguistics)computer.software_genrelanguage.human_languageParallel corporaBilingual lexiconResource (project management)languageQuality (business)Artificial intelligencebusinesscomputerWord (computer architecture)Natural language processingmedia_commonProceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra)

researchProduct

Reflection Assignment as a Tool to Support Students’ Metacognitive Awareness in the Context of Computer-Supported Collaborative Learning

2021

The present study explores the potential of a reflection assignment as a tool for supporting master’s degree students’ metacognitive skills in the context of computer-supported collaborative learning (CSCL). The research question (RQ) is formulated as follows: How does a regularly submitted reflection assignment support the development of students’ individual metacognitive awareness in the context of CSCL? The empirical data is a text corpus (7878 words) extracted from individual students’ (N = 13) reflection assignments (N = 65) submitted during one semester. Qualitative content analysis was employed to analyze the data. The results demonstrate that by the end of the course, the students s…

Text corpusReflection (computer programming)05 social sciences050301 educationMetacognition050109 social psychologyCollaborative learningContext (language use)computer.software_genreScripting languageComputer-supported collaborative learningComputingMilieux_COMPUTERSANDEDUCATIONMathematics education0501 psychology and cognitive sciencesPsychology0503 educationcomputerResearch question

researchProduct

Graph-based exploration and clustering analysis of semantic spaces

2019

Abstract The goal of this study is to demonstrate how network science and graph theory tools and concepts can be effectively used for exploring and comparing semantic spaces of word embeddings and lexical databases. Specifically, we construct semantic networks based on word2vec representation of words, which is “learnt” from large text corpora (Google news, Amazon reviews), and “human built” word networks derived from the well-known lexical databases: WordNet and Moby Thesaurus. We compare “global” (e.g., degrees, distances, clustering coefficients) and “local” (e.g., most central nodes and community-type dense clusters) characteristics of considered networks. Our observations suggest that …

Text corpusSemantic spacesComputer Networks and CommunicationsComputer sciencegraph theory0211 other engineering and technologiesWordNetNetwork science02 engineering and technologysemanttinen webSemantic networkword2vec similarity networksWord2vec similarity networksClique relaxationscohesive clusters0202 electrical engineering electronic engineering information engineeringWord2vecCluster analysisThesaurus (information retrieval)021103 operations researchMultidisciplinaryInformation retrievalverkkoteorialcsh:T57-57.97Graph theorycliquesGraph theoryclique relaxationsComputational MathematicsCliqueslcsh:Applied mathematics. Quantitative methodssemantic spaces020201 artificial intelligence & image processingCohesive clusters

researchProduct

Supporting Emotion Automatic Detection and Analysis over Real-Life Text Corpora via Deep Learning: Model, Methodology, and Framework

2021

This paper describes an approach for supporting automatic satire detection through effective deep learning (DL) architecture that has been shown to be useful for addressing sarcasm/irony detection problems. We both trained and tested the system exploiting articles derived from two important satiric blogs, Lercio and IlFattoQuotidiano, and significant Italian newspapers.

Text corpusSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniSettore INF/01 - InformaticaComputer sciencebusiness.industryDeep learningcomputer.software_genreNLPDeep LearningArtificial intelligenceSatire DetectionbusinesscomputerNatural language processing

researchProduct